Search CORE

261 research outputs found

A morphological approach for segmentation and tracking of human faces

Author: Marqués Acosta Fernando
Vilaplana Besler Verónica
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

A new technique for segmenting and tracking human faces in video sequences is presented. The technique relies on morphological tools such as using connected operators to extract the connected component that more likely belongs to a face, and partition projection to track this component through the sequence. A binary partition tree (BPT) is used to implement the connected operator. The BPT is constructed based on the chrominance criteria and its nodes are analyzed so that the selected node maximizes an estimation of the likelihood of being part of a face. The tracking is performed using a partition projection approach. Images are divided into face and non-face parts, which are tracked through the sequence. The technique has been successfully assessed using several test sequences from the MPEG-4 (raw format) and the MPEG-7 databases (MPEG-1 format).Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Saliency maps on image hierarchies

Author: Vilaplana Besler Verónica
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

© 2015 Elsevier B.V. All rights reserved. In this paper we propose two saliency models for salient object segmentation based on a hierarchical image segmentation, a tree-like structure that represents regions at different scales from the details to the whole image (e.g. gPb-UCM, BPT). The first model is based on a hierarchy of image partitions. The saliency at each level is computed on a region basis, taking into account the contrast between regions. The maps obtained for the different partitions are then integrated into a final saliency map. The second model directly works on the structure created by the segmentation algorithm, computing saliency at each node and integrating these cues in a straightforward manner into a single saliency map. We show that the proposed models produce high quality saliency maps. Objective evaluation demonstrates that the two methods achieve state-of-the-art performance in several benchmark datasets.Peer ReviewedPostprint (author's final draft

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

3D Convolutional Neural Networks for Brain Tumor Segmentation: A Comparison of Multi-resolution Architectures

Author: Aduriz Asier
Casamitjana Adrià
Puch Santi
Vilaplana Verónica
Publication venue
Publication date: 01/01/2017
Field of study

This paper analyzes the use of 3D Convolutional Neural Networks for brain tumor segmentation in MR images. We address the problem using three different architectures that combine fine and coarse features to obtain the final segmentation. We compare three different networks that use multi-resolution features in terms of both design and performance and we show that they improve their single-resolution counterparts

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Action tube extraction based 3D-CNN for RGB-D action recognition

Author: Morros Rubió Josep Ramon
Vilaplana Besler Verónica
Xu Zhengyu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial tube extraction and temporal sampling. The first part is built upon MobileNet-SSD and its role is to define the spatial region where the action takes place. The second part is based on the structural similarity index (SSIM) and is designed to remove frames without obvious motion from the primary action tube. The final extracted action tube has two benefits: 1) a higher ratio of ROI (subjects of action) to background; 2) most frames contain obvious motion change. We propose to use a two-stream (RGB and Depth) I3D architecture as our 3D-CNN model. Our approach outperforms the state-of-the-art methods on the OA and NTU RGB-D datasets. © 2018 IEEE.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Layer-wise training for self-supervised learning on graphs

Author: Pina Oscar
Vilaplana Verónica
Publication venue
Publication date: 04/09/2023
Field of study

End-to-end training of graph neural networks (GNN) on large graphs presents several memory and computational challenges, and limits the application to shallow architectures as depth exponentially increases the memory and space complexities. In this manuscript, we propose Layer-wise Regularized Graph Infomax, an algorithm to train GNNs layer by layer in a self-supervised manner. We decouple the feature propagation and feature transformation carried out by GNNs to learn node representations in order to derive a loss function based on the prediction of future inputs. We evaluate the algorithm in inductive large graphs and show similar performance to other end to end methods and a substantially increased efficiency, which enables the training of more sophisticated models in one single device. We also show that our algorithm avoids the oversmoothing of the representations, another common challenge of deep GNNs

arXiv.org e-Print Archive

Feature propagation as self-supervision signals on graphs

Author: Pina Oscar
Vilaplana Verónica
Publication venue
Publication date: 04/09/2023
Field of study

Self-supervised learning is gaining considerable attention as a solution to avoid the requirement of extensive annotations in representation learning on graphs. Current algorithms are based on contrastive learning, which is computation an memory expensive, and the assumption of invariance under certain graph augmentations. However, graph transformations such as edge sampling may modify the semantics of the data so that the iinvariance assumption may be incorrect. We introduce Regularized Graph Infomax (RGI), a simple yet effective framework for node level self-supervised learning that trains a graph neural network encoder by maximizing the mutual information between output node embeddings and their propagation through the graph, which encode the nodes' local and global context, respectively. RGI do not use graph data augmentations but instead generates self-supervision signals with feature propagation, is non-contrastive and does not depend on a two branch architecture. We run RGI on both transductive and inductive settings with popular graph benchmarks and show that it can achieve state-of-the-art performance regardless of its simplicity.Comment: 16 pages, 1 figure, preprin

arXiv.org e-Print Archive

Monte-Carlo sampling applied to multiple instance learning for whole slide image classification

Author: Combalia Marc
Vilaplana Besler Verónica
Publication venue
Publication date: 01/01/2018
Field of study

In this paper we propose a patch sampling strategy based on sequential Monte-Carlo methods for Whole Slide Image classification in the context of Multiple Instance Learning and show its capability to achieve high generalization performance on the differentiation between sun exposed and not sun exposed pieces of skin tissue.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Brain MRI super-resolution using generative adversarial networks

Author: Sánchez Irina
Vilaplana Besler Verónica
Publication venue
Publication date: 01/01/2018
Field of study

In this work we propose an adversarial learning approach to generate high resolution MRI scans from low resolution images. The architecture, based on the SRGAN model, adopts 3D convolutions to exploit volumetric information. For the discriminator, the adversarial loss uses least squares in order to stabilize the training. For the generator, the loss function is a combination of a least squares adversarial loss and a content term based on mean square error and image gradients in order to improve the quality of the generated images. We explore different solutions for the up sampling phase. We present promising results that improve classical interpolation, showing the potential of the approach for 3D medical imaging super-resolution.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Picking groups instead of samples: a close look at Static Pool-based Meta-Active Learning

Author: Mas Méndez Ignasi
Morros Rubió Josep Ramon
Vilaplana Besler Verónica
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Active Learning techniques are used to tackle learning problems where obtaining training labels is costly. In this work we use Meta-Active Learning to learn to select a subset of samples from a pool of unsupervised input for further annotation. This scenario is called Static Pool-based Meta-Active Learning. We propose to extend existing approaches by performing the selection in a manner that, unlike previous works, can handle the selection of each sample based on the whole selected subset.Peer ReviewedPostprint (author's final draft

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

BCN20000: dermoscopic lesions in the wild

Author: Codella Noel C. F.
Combalia Marc
Rotemberg Veronica
Vilaplana Besler Verónica
Publication venue
Publication date: 01/01/2019
Field of study

This article summarizes the BCN20000 dataset, composed of 19424 dermoscopic images of skin lesions captured from 2010 to 2016 in the facilities of the Hospital Clínic in Barcelona. With this dataset, we aim to study the problem of unconstrained classification of dermoscopic images of skin cancer, including lesions found in hard-to-diagnose locations (nails and mucosa), large lesions which do not fit in the aperture of the dermoscopy device, and hypo-pigmented lesions. The BCN20000 will be provided to the participants of the ISIC Challenge 2019 [8], where they will be asked to train algorithms to classify dermoscopic images of skin cancer automatically.Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC